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We have implemented C like Continuation based programming language. Continuation based C, CbC was 
implemented using micro-C on various architecture, and we have tried several CbC programming experi- 
ments. Here we report new implementation of CbC compiler based on GCC 4.2.3. Since it contains full C 
capability, we can use CbC and C in a mixture. 



A Practical Continuation 
based Language 



If CPS theory is successful, it should also be work- 
ing well in practical area. Our idea is simple. How 
about a programming language which has continua- 
tion passing style only? How about it runs as fast as 
current GNU C compiler? 

Instead of creating complete new programming 
language, we designed a lower language of C, so called 
Continuation based C, here after CbC. Using CPS 
transformation like method, we can compile C into 
CbC, that is, we have some kind of backward com- 
patibility. 

We have implemented CbC using micro-C on vari- 
ous architecture, and we have tried several CbC pro- 
gramming experiments. Here we report new partial 
implementation of CbC compiler[5] based on GCC 
4.2.3T. Since it contains full C capability, we can 
use CbC and C in a mixture, so when call the mix- 
ture C with C, here after CwC. 

First we show CbC language overview. 



2 Continuation based C 

CbC's basic programming unit is a code segment. It 
is not a subroutine, but it looks like a function, be- 
cause it has input and output. We can use C struct 
as input and output interfaces. 

struct interfacel ■[ int i; }; 
struct interface2 ■[ int o; }; 

code f (struct interfacel a) { 

struct interface2 b; b.o=a.i; 
goto g(b) ; 

} 



In this example, a code segment f has input a 
and sends output b to a code segment g. There is 
no return from code segment b, b should call another 
continuation using goto. Any control structure in C 
is allowed in CwC language, but in case of CbC, we 
restrict ourselves to use if statement only, because 
it is sufficient to implement C to CbC translation. In 
this case, code segment has one input interface and 
several output interfaces (fig El . 

code and parameterized global goto statement 

is an extension of Continuation based C. Unlike C — 
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Figure 1: code 



3 What's good? 

CbC is a kind of high level assembler language. It 
can do several original C language cannot do. For 
examples, 

Thread Scheduler 
Context Switch 
Synchronization Primitives 
I/O wait semantics 



[4]'s parameterized goto, we cannot goto into normal 
C function. 

2.1 Intermix with C 

In CwC, we can go to a code segment from a C func- 
tion and we can call C functions in a code segment. 
So we don't have to shift completely from C to CbC. 
The later one is straight forward, but the former one 
needs further extensions. 

void *env; 

code (*exit) (int) ; 

code h(char *s) { 

printf (s) ; 

goto (*exit) (0) ,env; 

} 

int mainO { 

env = environment ; 

exit = return; 

goto hC'hello WorldXn"); 

} 



are impossible to write in C. Usually it requires 
some help of assembler language such as asm state- 
ment extension which is of course not portable. 

3.1 Scheduler example 

We can easily write these things in CbC, because CbC 
has no hidden information behind the stack frame of 
C. A thread simply go to the scheduler, 

goto scheduler (self , task_list) ; 

and the scheduler simply pass the control to the 
next thread in the task queue. 

code scheduler (Thread self.TaskPtr list) 
i 

TaskPtr t = list; 

TaskPtr e; 

list = list->next; 

goto list->thread->next (list->thread, list) ; 

} 



In this hello world example, the environment of 
mainO and its continuation is kept in global vari- 
ables. The environment and the continuation can be 

get using environment, and return. Arbitrary 

mixture of code segments and functions are allowed 
(in CwC). The continuation of goto statement never 
returns to original function, but it goes to caller of 
original function. In this case, it returns result to 
the operating system. 



Of course it is a simulator, but it is an implemen- 
tation also. If we have a CPU resource API, we can 
write real multi CPU scheduler in CbC. 

This is impossible in C, because we cannot access 
the hidden stack which is necessary to switch in the 
scheduler. In CbC, everything is visible, so we can 
switch threads very easily. 

This means we can use CbC as an executable spec- 
ification language of OS API. 
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3.2 Self Verification 



4 Transformation (C2CbC) 



Since we can write a scheduler in CbC, we can also 
enumerate all possible interleaving of a concurrent 
program. We have implement a model checker in 
CwC. CbC can be a self verifiable language [J- 

SPIN[3] is a very reliable model checker, but it have 
to use special specification language PROMELA. We 
cannot directly use PROMELA as an implementa- 
tion language, and it is slightly difficult to study its 
concurrent execution semantics including communi- 
cation ports. 

There are another kind of model checker for real 
programming language, such as Java PathFinder[2]. 
Java PathFinder use Java Virtual Machine (JVM) 
for state space enumeration which is very expensive 
some time. 

In CbC, state enumerator itself is written in CbC, 
and its concurrency semantics is written in CbC it- 
self. Besides it is very close to the implementation. 
Actually we can use CbC as an implementation lan- 
guage. Since enumerator is written in the application 
itself, we can perform abstraction or approximation 
in the application specific way, which is a little diffi- 
cult in Java PathFinder. It is possible to handle JVM 
API for the purpose, although. 

We can use CPS transformed CbC source code for 
verification, but we don't have to transform all of the 
source code, because CwC supports all C constructs. 
(But not in C-I--I-... Theoretically it is possible with 
using cfront converter, it should be difficult). 

3.3 As a target language 

Now we have GCC implementation of CbC, it runs 
very fast. Many popular languages are implemented 
on top of C. Some of them uses very large switch 
statement for the byte code interpreter. We don't 
have to use these hacks, when we use CbC as an 
implementation language. 

CbC is naturally similar to the state charts. It 
means it is very close to UML diagrams. Although 
CbC does not have Object Oriented feature such as 
message passing nor inheritance, which is not crucial 
in UML. 



Conversion from C to CbC is straight forward, but it 
generates a lot of code segments. Since CbC does not 
have heap management itself, the stack area have to 
be allocated explicitly. 

We find GCC can perform better optimization in 
translated code segment. We will discuss it later. 

We have an easy implementation of C to CbC com- 
pilation, but it is not a practical level, but we need 
good converter for backward compatibility. 

We can also consider possible conversion from 
G++ to CbC. In this case, all hidden operation in 
C++ should become explicit, for examples, object 
allocations and deallocations in the stack, handling 
of auto pointer and so on. 

5 GNU CC implementation 

So how to implement CwC in GCC. The idea itself is 
simple forcing C tail call elimination for all code 
segment. 

But before GCC version 4.x, tail call elimination 
(here after TCE) is not so cleanly implemented , it is 
very difficult to implement it. But in GCC 4.x, basi- 
cally TCE can be applied for all possible functions. 

code is implemented as a new type keyword in 

GCC. You may think code is an attribute of a func- 
tion, which means that the function can call in tail 
call elimination only. 

Because of this implementation, we can actually 
call code segment as a normal function call. 

5.1 How to force tail call elimination 

There many enable conditions for tail call elimina- 
tion, for example, there should be no statement after 
tail call, return value type have to be the same, argu- 
ments size should be compatible, and so on. We find 
almost half of lines in calls . c spends to check TCE 
possibilities. 

Our conclusion is this. It is not practical to make 
sure to pass all the TCE tests, instead, we write 
TCE only version of expand_call() separately in 
783 lines. 
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4463 18527 145469 calls. c 
expand_call() for function 
783 2935 23651 cbc-goto.h 

expancl_cbc_goto() for code segment 



All code segment has the same virtual argument 
size and void return type, that is argument register 
or argument value in the memory is shared among all 
code segments. This leads a problem. 

5.2 Parallel Assignment 

Consider the next code, 



fOCint i) i 
int k , j ; 
k = 3+i; 
j = g0(i+3); 
return k+4+ j ; 



gOCint i) { 

return hO(i+4)+i; 

} 

hOCint i) { 

return i+4; 

} 



It IS written m C, we perform CPS transformation 

code carg4(struct ars argsO, struct arg eirgsl,. i , i i i mi t .■ ■ 

.^. .^..^f.°.,- m several steps by hands. There are several optimiza- 

int 1, mt J.int k,int 1) i- ^ -t- 

^ tion is possible, 

goto carg5(argsl,args0, j ,k,l,i) ; 



In this case, simple sequential assignments does not 
work. It override argsl or argsO. In normal func- 
tion case, GCC simply give up TCE, and pushes all 
arguments in new register or stack area. We are not 
allowed that. That is we have to implement parallel 
assignment in the code segment goto. 

This is done by simple copy overlapped arguments 
in a stack. We hope to eliminate unnecessary copy 
during GCC optimization. 



/* straight conversion case (1) */ 

typedef char *stack; 

struct cont_interf ace { 

// General Return Continuation 
code (*ret) ; 

}; 

code f(int i, stack sp) { 

int k , j ; 
k = 3+i; 

goto f_gO(i,k,sp) ; 



5.3 Not yet done 

Currently wc have not ycit implemented goto with 
environment and return, environment. 

In some GCC 4.x supported architecture, TCE it- 
self is not supported in special case. Our method 
does not work for the architecture. 

Since we made modifications on GCC compiler it- 
self, our method is GCC version sensitive. We have 
to do necessary modifications for coming new version 
of GCC. 



6 Result 

Here is our bench mark program. 



struct f _gO_interf ace { 

// Specialized Return Continuation 

code (*ret) ; 

int i_,k_,j_; 

}; 

code f_gl(int j, stack sp) ; 

code f_gO(int i.int k, stack sp) i // Caller 

struct f _gO_interf ace *c = 

(struct f _gO_interf ace *) ( 
sp -= sizeof (struct f _gO_interf ace)) ; 

c->ret = f_gl; 
c->k_ = k; 
c->i_ = i; 
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goto g(i+3,sp) ; 



.code f_gl(int j, stack sp) -[ // Continuation 
struct f _gO_interf ace *c = 

(struct f_gO_ interface *)sp; 
int k = c->k_; 

sp+=sizeof (struct f _gO_interf ace) ; 
c = (struct f _gO_interf ace *)sp; 
goto (c->ret) (k+4+j , sp) ; 



code g_hl(int j, stack sp) ; 

code g(int i, stack sp) { // Caller 

struct f _gO_interf ace *c = 

(struct f _gO_interf ace *) ( 
sp -= sizeof (struct f _gO_interf ace) ) ; 

c->ret = g_hl; 
c->i_ = i; 

goto h(i+3,sp) ; 



if (loop— >0) 

goto f (233,sp) ; 
printf ( "#0103 : XdNn" , i) ; 

goto (( (struct main_continuation *)sp)->main_ret) (0) , 
((struct main_continuation *)sp)->env; 



This is awfully long, but it is straight forward. Sev- 
eral forward prototyping is necessary, and we find 
strict prototyping is painful in CbC, because we have 
to use many code segments to perform simple thing. 
CbC is not a language for human, but for auto- 
matic generation, verification or IDE directed pro- 
gramming. 

We can shorten the result in this way. 

/* little optimized case (3) */ 

code f2_l(int i,cliar *sp) { 

int k , j ; 
k = 3+i; 

goto g2_l(k,i+3,sp) ; 



code g_hl(int j, stack sp) { 

// Continuation 

struct f _gO_interf ace *c = 

(struct f _gO_interf ace *)sp; 
int i = c->i_; 

sp+=sizeof (struct f _gO_interf ace) ; 
c = (struct f _gO_interf ace *)sp; 
goto (c->ret) (j+i,sp) ; 



code h(int i, stack sp) ■[ 

struct f _gO_interf ace *c = 

(struct f _gO_interf ace *)sp; 
goto (c->ret) (i+4,sp) ; 



struct main_continuation { 

// General Return Continuation 

code (*ret) () ; 

code (*main_ret) () ; 

void *env; 



code main_return(int i, stack sp) {. 



code g2_l(int k,int i.char *sp) { 

goto h2_ll(k,i+4,sp) ; 



code f2_0_l(int k.int j.char *sp) ; 

code h2_l_l(int i.int k.int j.char *sp) { 

goto f2_0_l(k,i+j ,sp) ; 



code h2_ll(int i.int k, chair *sp) { 

goto h2_l_l(i,k,i+4,sp) ; 



.code f2_0_l(int k.int j.char *sp) { 
goto (( (struct cont_interf ace *) 
sp)->ret) (k+4+j ,sp) ; 



code main_return2_l (int i, stack sp) { 

if (loop — >0) 

goto f2_l(233,sp) ; 
printf ("#0165: y.d\n" ,i) ; 

goto (( (struct main_ continuation *) sp) ->main_ret) (0) , 
((struct main_ continuation *)sp)->env; 
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In this example, CPS transformed source is faster 
than original function call form. There are not so 
much area for the optimization in function call form, 
because function call API have to be strict. CbC does 
not need standard call API other than interface which 
is simply a struct and there are no need for register 
save. (This bench mark is designed to require the 
register save). 

Here is the result in IA32 architecture (Tabled]). 
Micro-C is our previous implementation in tiny C. 
convl 1 is function call, convl 2, convl 3 is opti- 
mized CPS transformed source. 





./convl 1 


./convl 2 


./convl 3 


Micro-C 


8.97 


2.19 


2.73 


GCC 


4.87 


3.08 


3.65 


GCC (+omit) 


4.20 


2.25 


2.76 


GCC (+fast) 


3.44 


1.76 


2.34 



Table 1: Micro-C, GCC bench mark (in sec) 



There are two optimization flag for GCC. 
-f omit-f rajne-pointer eliminates frame pointer 
(%ebp). The frame pointer itself is useful in code 
segment, but it generates unnecessary push and pop 
or leave instruction. Using f astcall option, GCC 
ignore the standard call convention such as all argu- 
ment have be on stack in IA32. In Micro-C implemen- 
tation, these optimization is naturally implemented 
in code segment, so it is faster than GCC without 
these options. 

But with these options, GCC is faster than Micro- 
C. Of course, in more complex source, GCC's com- 
plex optimization should work well. 
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7 Conclusion 

We have designed and implemented Continuation 
based language for practical use. We have partial 
implementation of CwC using GCC 4.2.3. Using 
suitable optimized options CPS transformed source 
sometimes runs faster than original function call ver- 
sion. 

This gcc implementation should be portable on all 
architectures supporting tail call elimination, but we 
have tested only on 1386 now. 
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